Using inplace=True
when modifying a Pandas DataFrame means that the method will modify the DataFrame in place, rather than returning a
new object:
df.an_operation(inplace=True)
When inplace
is False
(which is the default behavior), a new object is returned instead:
df2 = df.an_operation(inplace=False)
Generally speaking, the motivation for modifying an object in place is to improve efficiency by avoiding the creation of a copy of the original
object. Unfortunately, many methods supporting the inplace keyword either cannot actually be done inplace, or make a copy as a consequence of the
operations they perform, regardless of whether inplace
is True
or not. For example, the following methods can never operate
in place:
- drop (dropping rows)
- dropna
- drop_duplicates
- sort_values
- sort_index
- eval
- query
Because of this, expecting efficiency gains through the use of inplace=True
is not reliable.
Additionally, using inplace=True
may trigger a SettingWithCopyWarning
and make the overall intention of the code unclear.
In the following example, modifying df2
will not modify the original df
dataframe, and a warning will be raised:
df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})
df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
In general, side effects such as object mutation may be the source of subtle bugs and explicit reassignment is considered safer.
When intermediate results are not needed, method chaining is a more explicit alternative to the inplace
parameter. For instance, one
may write:
df.drop('City', axis=1, inplace=True)
df.sort_values('Name', inplace=True)
df.reset_index(drop=True, inplace=True)
Through method chaining, this previous example may be rewritten as:
result = df.drop('City', axis=1).sort_values('Name').reset_index(drop=True)
For these reasons, it is therefore recommended to avoid using inplace=True
in favor of more explicit and less error-prone
alternatives.